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CHARACTERIZATION OF THE YEAST TRANSCRIPTOME 

This application is a continuation-in-part of co-pending application Serial No. 
09/012,03 1 filed January 22, 1998, the disclosure of which is incorporated by reference 
herein. This invention was made with government support under CAS734S awarded 
by the National Institutes of Health. The government has certain rights in the 
invention. 

TECHNICAL FIELD OF THE INVENTION 

This invention is related to the characterization of the expressed genes of the 
yeast genome. More particularly, it is related to the identification and use of previously 
unrecognized genes. 



BACKGROUND OF THE INVENTION 

It is by now axiomatic that the phenotype of an organism is largely determined 
by the genes expressed within it. These expressed genes can be represented by a 
"transcriptome," conveying the identity of each expressed gene and its level of 
expression for a defined population of cells. Unlike the genome, which is essentially 
a static entity, the transcriptome can be modulated by both .external and internal 
factors. The transcriptome thereby serves as a dynamic link between an organism's 
genome and its physical characteristics. 

The transcriptome as defined above has not been characterized in any eukaryotic 
or prokaryotic organism, largely because of technological limitations. However, some 
general features of gene expression patterns were elucidated two decades ago through 
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RNA-DNA hybridization measurements (Bishop etal, 1974; Hereford and Rosbash, 
1 977). In many organisms, it was thus found that at least three classes of transcripts 
could be identified, with either high, medium, or low levels of expression, and the 
number of transcripts per cell were estimated (Lewin, 1980). These data of course 
provided little information about the specific genes that were members of each class. 
Data on the expression levels of individual genes have accumulated as new genes were 
discovered. However, in only a few instances have the absolute levels of expression 
of particular genes been measured and compared to other genes in the same cell type. 

Description of any cell's transcriptome would therefore provide new 
information useful for understanding numerous aspects of cell biology and 
biochemistry. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide isolated DNA molecules and 
methods of using such molecules to affect the cell cycle and identify candidate drugs. 
These and other objects of the invention are achieved by providing the art with one 
or more of the embodiments described below. 

According to one embodiment of the invention an isolated DNA molecule is 
provided. It comprises a coding sequence of a yeast gene selected from the group 
consisting of NORF genes comprising a SAGE tag as shown in SEQ ID NOS:67-8 1 1 . 

According to another embodiment of the invention a method of using NORF 
genes is provided. The method is for affecting the cell cycle of a cell. The method 
comprises the step of administering to a cell an isolated DNA molecule comprising 
a coding sequence of a NORF gene whose expression varies by at least 1 0% between 
any two phases of the cell cycle selected from the group consisting of log phase, S 
phase, and G2/M. 

In yet another embodiment of the invention a method for screening candidate 
antifungal drugs is provided. The method comprises the steps of contacting a test 
substance with a yeast cell and monitoring expression of a NORF gene whose 
expression varies by at least 10% between any two phases of the cell cycle selected 
from the group consisting of log phase, S phase, and G2/M, wherein a test substance 
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which modifies the expression of the yeast gene is a candidate antifungal drug. 

In still another embodiment of the invention a method for identifying human 
genes which are involved in cell cycle progression is provided. The method comprises 
the step of contacting human DNA with a probe which comprises at least 14 
contiguous nucleotides of a NORF gene whose expression varies by at least 10% 
between any two phases of the cell cycle selected from.the group consisting of log 
phase, S phase, and G2/M. A human DNA sequence which hybridizes to the probe 
is identified as a sequence of a candidate human gene which is involved in cell cycle 
progression. 

The present invention provides probes which comprise at least 14 contiguous 
nucleotides of a NORF gene comprising a SAGE tag as shown in SEQ ID NOS:67- 
811. 

The invention also provides an array of probes on a solid support. At least one 
probe in the array comprises at least 14 contiguous nucleotides of a NORF gene 
comprising a SAGE tag as shown in SEQ ID NOS:67-81 1 . 

Still another embodiment of the invention is a method of identifying a candidate 
drug as a member of a class of drugs having a characteristic effect on gene expression 
in a yeast cell. A yeast cell is contacted with a candidate drug. Expression of at least 
one NORF gene whose expression is affected by the class of drugs is monitored in the 
yeast cell. Detection of a difference in expression of the at least one NORF gene 
relative to expression in the absence of the candidate drug identifies the candidate 
drug as a member of the class of drugs. 

These and other embodiments of the invention which will be apparent to those 
of skill in the art upon reading the detailed disclosure provided below, make available 
to the art hitherto unrecognized genes, and information about the expression of genes 
globally at the organismal level. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure L Schematic of SAGE Method and Genome Analysis. In applying 
SAGE to the analysis of yeast gene expression patterns, the 3* most NlalH site was 
used to define a unique position in each transcript and to provide a site for ligation of 
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a linker with a BsmFI site. The type lis enzyme BsmFI, which cleaves a defined 
distance from its non-pal indromic recognition site, was then used to generate a I5bp 
SAGE tag (designated by the black arrows), which includes the NlalH site. 
Automated sequencing of concatenated SAGE tags allowed the routine identification 
of about a thousand tags per 36-lane sequencing gel Once sequenced, the abundance 
of each SAGE tag was calculated, and each tag was used to search the entire yeast 
genome to identify its corresponding gene. The lower panel shows a small region of 
Chromosome 15. Gray arrows indicate all potential SAGE tags (Nlalll sites) and 
black arrows indicate 3' most SAGE tags. The total number of tags observed for each 
potential tag is indicated above (+ strand) or below (- strand) the tag. As expected, the 
observed SAGE tags were associated with the 3* end of expressed genes. 

Figure 2. Sampling of Yeast Gene Expression. Analysis of increasing amounts 
of ascertained tags reveals a plateau in the number of unique expressed genes. 
Triangles represent genes with known functions, squares represent genes predicted on 
the basis of sequence information, and circles represent total genes. 

Figure 3. Virtual Rot (A) Abundance Classes in the Yeast Transcriptome. 
The transcript abundance is plotted in reverse order on the abscissa, whereas the 
fraction of total transcripts with at least that abundance is plotted on the ordinate. The 
dotted lines identify the three components of the curve, 1 , 2, and 3. This is analogous 
to a Rot curve derived from reassociation kinetics where the product of initial RNA 
concentration and time is plotted on the abscissa, and the percent of labeled cDNA 
that hybridizes to excess mRNA is plotted on the ordinate. (B) Comparison of 
Virtual Rot and Rot Components. Transitions and data from virtual Rot components 
were calculated from the data in Figure 3A, while data for Rot components were 
obtained from Hereford and Rosbash, 1 977. 

Figure 4. Chromosomal Expression Map for S. cerevisiae. Individual yeast 
genes were positioned on each chromosome according to their open reading frame 
(ORF) start coordinates. Abundance levels of tags corresponding to each gene are 
displayed on the vertical axis, with transcription from the + strand indicated above the 
abscissa and that from the * strand indicated below. Yellow bands at ends of the 
expanded chromosome represent telomeric regions that are undertranscribed (see text 
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for details). 

Figure 5, Northern Blot Analysis of Representative Genes. TDH2/3, TEF1/2 
and NORF 1 , are expressed relatively equally in all three states (lane 1 , G2/M arrested; 
lane 2, S phase arrested; lane 3, log phase), while RNR4, RNR2 , and NORF5 are 
5 highly expressed in S-phase arrested cells. The expression level observed by SAGE 

(number of tags) is noted below each lane and was highly correlated with quantitation 
of the Northern blot by Phosphorlmager analysis (rM).97). 

TABLE LEGENDS 

Table 1. Highly Expressed Genes. Tag represents the 10 bp SAGE tag adjacent 

10 to the Nlain site; Gene represents the gene or genes corresponding to a particular tag 

(multiple genes that match unique tags are from related families, with an average 
identity of 93%); Locus and Description denote the locus name and functional 
description of each ORF, respectively; Copies/cell represents the abundance of each ' 
transcript in the SAGE library, assuming 1 5,000 total transcripts per cell and 60,633 

1 5 ascertained transcripts. 

Table 2. Expression of Putative Coding Sequences. Table column headings are 
the same as for Table 1 . 

Table 3. Expression of the most abundant NORF genes. SAGE Tag, Locus, 
and Copies/cell are the same as for Table 1 ; Chr and Tag Pos denote the chromosome 

20 and position of each tag; ORF Size denotes the size of the ORF corresponding to the 

indicated tag. In each case, the tag was located within or less than 250 bp 3' of the 
NORF. 

Table 4. Expression of NORF genes. SAGE tag and Copies/cell are the same 
as for Table 1. Chr and Tag Pos denote the chromosome and position of each tag. 

25 Table 5. Gene expression changes in different cell cycle phases. L denotes log 

phase; S denotes synthesis phase; G2/M denotes the mitotic phase. Tag Sequence 
represents the 10 bp SAGE tag adjacent to the Nlalll site; "ratio L to S" denotes the 
ratio of expression in log phase to expression in synthesis phase; "ratio S to G2/M" 
denotes the ratio of expression in synthesis phase to expression in G2/M phase; "ratio 

30 G2/M to L" denotes the ratio of expression in G2/M to log phase. #DIV/0! indicates 
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an increase in expression from 0; a value of 0 indicates a decrease in expression to 0; 
a value of 1 indicates no change; a value less than 1 indicates a decrease in expression; 
and a value greater than 1 indicates an increase in expression. 

Table 6. Intergenic open reading frames that contain or are adjacent to observed 
SAGE tags. Copies/cell represents abundance of each mRNA transcript as in Table 
1. Positive expression level indicates the tag is on the + strand of the chromosome; 
Negative expression level indicates the tag is on the - strand. 

It is a discovery of the present invention that certain hitherto unknown genes 
(the NORFs) exist and are expressed in yeast These genes, as well as other 
previously identified and previously postulated genes, can be used to study, monitor, 
and affect phases of cell cycle. The present invention identifies which genes are 
differentially expressed during the cell cycle. Differentially expressed genes can be 
used as markers of phases of the cell cycle. They can also be used to affect a change 
in the phase of the cell cycle. In addition, they can be used to screen for drugs which 
affect the cell cycle, by affecting expression of the genes. Human homologs of these 
eukaryotic genes are also presumed to exist, and can be identified using the yeast 
genes as probes or primers to identify the human homologs. 

New genes termed NORFs (not previously assigned open reading frames) have 
been found They are uniquely identified by their SAGE tags. In addition their entire 
nucleotide sequences are known and publicly available. In general, these were not 
previously identified as genes due to their small size. However, they have now been 
found to be expressed 

Differentially expressed yeast genes are those whose expression varies by a 
statistically significant difference (to greater than 95% confidence level) within 
different growth phases, particularly log phase, S phase, and G2/M. Preferably the 
difference is at least 10%, 25%, 50%, or 100%. In some cases, differentially 
expressed genes are not expressed at detectable levels in one or more cell cycle phases 
as determined by SAGE analysis. Genes which have been found to have differential 
expression characteristics include: NORF N a I, 2, 4, 5, 6, 17, 25, 27, TEFI/TEF2, 
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EN02, ADH1, ADH2, PGKI, CUP1A/CUP1B, PYK1, YKL056C, YMR116C, 
YEL033W, YOR182C, YCR013C, ribonucleotide reductase 2 and 4, and YJR085C. 
Differential expression can be detected by any means known in the art, such as 
hybridization to specific probes or immunological assays. 

Isolated DNA molecules according to the invention contain less than a whole 
chromosome and can be genomic or cDNA, /.e. t lacking introns. Isolated DNA 
molecules can comprise a yeast gene or a coding sequence of a yeast gene involved 
in cell cycle progression, such as NORF genes which comprise SAGE tags as shown 
in SEQ ID NOS:67-81 1. Isolated DNA molecules which comprise yeast genes or 
coding sequences of yeast genes comprising SAGE tags as shown in SEQ ID NOS:37- 
12,203 are also isolated DNA molecules of the invention. Isolated DNA molecules 
can also consist of a yeast gene or a coding sequence of a yeast gene which comprises 
a SAGE tag as shown in SEQ ID NOS:37-12,203 or 67-81 1. 

Any technique for obtaining a DNA of known sequence may be used to obtain 
isolated DNA molecules of the invention. Preferably they are isolated free of other 
cellular components such as membrane components, proteins, and lipids. They can 
be made by a cell and isolated, or synthesized using PCR or an automatic synthesizer. 
Methods for purifying and isolating DNA are routine and are known in the art. 

To administer yeast genes to cells, any DNA delivery techniques known in the 
art may be used, without limitation. These include liposomes, transfection, mating, 
transduction, transformation, viral infection, electroporation. Vectors for particular 
purposes and characteristics can be selected by the skilled artisan for their known 
properties. Cells which can be used as gene recipients are yeast and other fungi, 
mammalian cells, including humans, and bacterial cells. 

Antifungal drugs can be identified using yeast cells as described herein. 
Expression of a differentially expressed NORF gene can be monitored by any means 
known in the art. When a test substance modifies the expression of such a 
differentially expressed gene, for example by increasing or decreasing its expression, 
it is a candidate drug for affecting the growth properties of fungi and may be useful 
as an antifungal agent. Expression of more than one NORF gene can be monitored. 
For example, expression of 2, 3, 4, 5, 10, 15, 20, 30, 40, 50, 60, 75, 100, 150, 250, 
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300, 350, 400, 450, or 500 or more NORF genes can be monitored in single or 
multiple assays. 

Because differentially expressed genes are likely to be involved in cell cycle 
progression, it is likely that these genes are conserved among species. The 
differentially expressed NORF genes identified by the present invention can be used 
to identify homologs in humans and other mammals by contacting DNA from these 
mammals with a probe which comprises at least 10 contiguous nucleotides of a 
differentially expressed NORF gene. The DNA can be genomic or cDNA, as is 
known in the art Means for identifying homologous genes among different species 
are well known in the art Briefly, stringency of hybridization can be reduced so that 
imperfectly matching sequences hybridize. This can be in the context of inter alia 
Southern blots, Northern blots, colony hybridization or PGR. Any hybridization 
technique which is known in the art can be used. A DNA sequence which hybridizes 
to the probe is identified as a sequence of a candidate gene which is involved in cell 
cycle expression. 

Probes according to the present invention are isolated DNA molecules which 
have at least 10, and preferably at least 12, 14, 16, 18, 20, or 25 contiguous 
nucleotides of a particular NORF gene or other differentially expressed gene. The 
probes may or may not be labeled. They may be used, for example, as primers for 
PCR assays, or for detection of gene expression for Southern or Northern blots or in 
situ hybridization. Preferably the probes are immobilized on a solid support. The 
solid support can be any surface to which a probe can be attached. Suitable solid 
supports include, but are not limited to, glass or plastic slides, tissue culture plates, 
microtiter wells, tubes, or particles such as beads, including but not limited to latex, 
polystyrene, or glass beads. Any method known in the art can be used to attach the 
a probe to the solid support, including use of covalent and non-covalent linkages, 
passive absorption, or pairs of binding moieties attached respectively to the probe and 
the solid support. 

More preferably, probes are present on an array so that multiple probes can 
simultaneously hybridize to a single biological sample. The probes can be spotted 
onto the array or synthesized in situ on the array. See Lockhart et. al. t Nature 
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Biotechnology, Vol. 14, December 1996, "Expression monitoring by hybridization 
. to high-density oligonucleotide arrays." A single array contains at least one NORF 
probe, but can contain more than 1 00, 500 or even 1 ,000 different probes in discrete 
locations. If desired, one or more NORF probe(s) present on the array can be 
5 nucleotide sequences from a NORF gene which is differentially expressed during the 

cell cycle. 

Genes identified by the present invention which are differentially expressed 
during the cell cycle can also be used to obtain gene expression profiles characteristic 
of the response of yeast genes of a yeast cell to a particular drug or class of drugs. 

10 Classes of drugs of particular interest for which gene expression profiles can be 

generated include those drugs which affect cell cycle or other cell processes, such as 
chemotherapeutic agents. If desired, gene expression profiles characteristic of more 
than one drug of a particular class can be generated and used to make a composite 
gene expression profile. For example, microtubule poison drugs such as vinblastin, 

IS taxol, vincristine, and taxotere can be used to generate gene expression profiles 

characteristic of microtubule poisons. 

To generate a gene expression profile characteristic of a particular drug or class 
of drugs, a yeast cell is contacted with a particular drug or a member of a particular 
class of drugs. Expression of at least one yeast gene is monitored, either before and 

20 after contacting or in the contacted cell and in another yeast cell Which has not been 

contacted with the drug. Genes which are monitored can be any yeast gene, including 
NORFS. Preferably, these genes are differentially expressed during the cell cycle. For 
example, yeast genes can be selected from genes comprising the SAGE tags shown 
in Tables 3, 4, 5, and 6 (SEQ ID NOS:67-12,203). If desired, genes such as NORF N fl 

25 1, 2, 4, 5, 6, 17, 25, or 27, TEF1/TEF2, EN02, ADH1, ADH2, PGK1, 

CUP1 A/CUP1B, PYK1, YKL056C, YMR1 16C, YEL033W, YOR182C, YCR013C, 
ribonucleotide reductase 2 and 4, and YJR085C, can be used for monitoring 
alterations in gene expression. 

The expression of any number of these genes, such as 1, 2, 3, 4, 5, 10, 15, 20, 

30 25, 30, 40, 50, 60, 75, 100, 150, 250, 500, 1000, 2000, 3000, 4000, 5000, or 5,500 

genes, can be measured. It is particularly convenient to monitor expression of the 
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differentially expressed genes using nucleic acids which are immobilized on a solid 
support or in an array, such as the gene arrays described above. 

Many genes, particularly cell cycle genes, are likely to be conserved between 
yeast and mammals, including humans. Thus, gene expression profiles characteristic 
of a drug or class of drugs can be used to predict the effects of candidate drugs on 
human cells, by identifying the candidate drug as a member of a class of drugs whose 
characteristic gene expression profile is known. The candidate drugs can be 
pharmacologic agents already known in the art or can be compounds previously 
unknown to have any pharmacological activity. The candidate drugs can be naturally 
occurring or designed in the laboratory. They can be isolated from microorganisms, 
animals, or plants, and can be produced recombinantly or synthesized by chemical 
methods known in the art 

The effect of a candidate drug on expression of at least one gene whose 
expression is affected by the class of drugs is monitored. A gene expression profile 
obtained using the candidate drug which is similar to a gene expression profile for a 
particular drug or class of drugs identifies the candidate drug as a member of that class 
of drugs. 

The effect of modifying particular substituents of a known drug or of a candidate 
drug can be similarly tested. Such methods are useful for determining whether 
alterations intended, for example, to increase solubility or absorption of a particular 
drug will have an unintended and possibly deleterious effect on genes which are 
differentially expressed during the cell cycle. 

The above disclosure generally describes the present invention. A more 
complete understanding can be obtained by reference to the following specific 
examples which are provided herein for purposes of illustration only, and are not 
intended to limit the scope of the invention. 

EXA MPL E 

Summary 

We have analyzed the set of genes expressed from the yeast genome, herein 
called the transcriptome, using serial analysis of gene expression (SAGE). Analysis 
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of 60,633 transcripts revealed 4,665 genes, with expression levels ranging from 0.3 
to over 200 transcripts per cell. Of these genes, 1,981 had known functions, while 
2,684 were previously uncharacterized. Integration of positional information with 
gene expression data allowed the generation of chromosomal expression maps, 
identifying physical regions of transcriptional activity, and identified genes that had 
not been predicted by sequence information alone. These studies provide insight into 
global patterns of gene expression in yeast and demonstrate the feasibility of genome- 
wide expression studies in eukaryotes. 

Results 

Characteristics and Rationale of SAGE Approach 

Several methods have recently been described for the high throughput 
evaluation of gene expression (Nguyen et al. f 1995; Schena et aL 9 1995; Velculescu 
et al. 9 1995). We used SAGE (Serial Analysis of Gene Expression) because it can 
provide quantitative gene expression data without the prerequisite of a hybridization 
probe for each transcript The SAGE technology is based on two basic principles 
(Figure 1). First, a short sequence tag (9-1 1 bp) contains sufficient information to 
uniquely identify a transcript, provided that it is derived from a defined location within 
that transcript Second, many transcript tags can be concatenated into a single 
molecule and then sequenced, revealing the identity of multiple tags simultaneously. 
The expression pattern of any population of transcripts can be quantitatively evaluated 
by determining the abundance of individual tags and identifying the gene 
corresponding to each tag. 

Genome-wide expression 

In order to maximize representation of genes involved in normal growth and cell-cycle 
progression, SAGE libraries were generated from yeast cells in three states: log phase, 
S phase arrested and G2/M phase arrested. In total, SAGE tags corresponding to 
60,633 total transcripts were identified (including 20,184 from log phase, 20,034 from 
S phase arrested, and 20,415 from G2/M phase arrested cells). Of these tags, 56,291 
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tags (93%) precisely matched the yeast genome, 88 tags matched the mitochondrial 
genome, and 91 tags matched the 2 micron plasmid. 

The number of SAGE tags required to define a yeast transcriptome depends on 
the confidence level desired for detecting low abundance mRNA molecules. 
Assuming the previously derived estimate of 15,000 mRNA molecules per cell 
(Hereford and Rosbash, 1977), 20,000 tags would represent a 1.3 fold coverage even 
for mRNA molecules present at a single copy per cell, and would provide a 72% 
probability of detecting such transcripts (as determined by Monte Carlo simulations). 
Analysis of 20,184 tags from log phase cells identified 3,298 unique genes. As an 
independent confirmation of mRNA copy number per cell, we compared the 
expression level of SUP44/RPS4, one of the few genes whose absolute mRNA levels 
have been reliably determined by quantitative hybridization experiments (Iyer and 
Struhl, 1996), with expression levels determined by SAGE. SUP44/RPS4 was 
measured by hybridization at 75 +/- 10 copies/cell (Iyer and Struhl, 1996), in good 
accord with the SAGE data of 63 copies/cell, suggesting that the estimate of 15,000 
mRNA molecules per cell was reasonably accurate. Analysis of SAGE tags from S 
phase arrested and G2/M phase arrested cells revealed similar expression levels for 
this gene (range 52 to 55 copies/cell), as well as for the vast majority of expressed 
genes. As less than 1% of the genes were expressed at dramatically different levels 
among these three states (see below), SAGE tags obtained from all libraries were 
combined and used to analyze global patterns of gene expression. 

Analysis of ascertained tags at increasing increments revealed that the number 
of unique transcripts plateaued at -60,000 tags (Figure 2). This suggested that 
generation of further SAGE tags would yield few additional genes, consistent with the 
fact that sixty thousand transcripts represented a four-fold redundancy for genes 
expressed as low as 1 transcript per cell. Likewise, Monte Carlo simulations indicated 
that analysis of 60,000 tags would identify at least one tag for a given transcript 97% 
of the time if its expression level was one copy per cell. 

The 56,291 tags that precisely matched the yeast genome represented 4,665 
different genes. This number is in agreement with the estimate of 3,000 to 4,000 
expressed genes obtained by RNA-DNA reassociation kinetics (Hereford and 
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Rosbash, 1977). These expressed genes included 85% of the genes with characterized 
functions (1,981 of 2,340), and 76% of the total genes predicted from analysis of the 
yeast genome (4,665 of 6,121). These numbers are consistent with a relatively 
complete sampling of the yeast transcriptome given the limited number of 
physiological states examined and the large number of genes predicted solely on the 
basis of genomic sequence analysis. 

The transcript expression per gene was observed to vary from 0.3 to over 200 
copies per cell. Analysis of the distribution of gene expression levels revealed several 
abundance classes that were similar to those observed in previous studies using 
reassociation kinetics. A "virtual Rot" of the genes observed by SAGE (Figure 3 A) 
identified three main components of the transcriptome with abundances ranging over 
three orders of magnitude. A Rot curve derived from RNA-cDNA reassociation 
kinetics also contained three main components distributed over a similar range of 
abundances (Hereford and Rosbash, 1977). Although the kinetics of reassociation of 
a particular class of RNA and cDNA may be affected by numerous experimental 
variables, there were striking similarities between Rot and virtual Rot analyses (Figure 
3B). Because Rot analysis may not detect all transcripts of low abundance (Lewin, 
1980), it is not surprising that SAGE revealed both a larger total number of expressed 
genes and a higher fraction of the transcriptome belonging to the low abundance 
transcript class. 

Integration of Expression Information with the Genomic Map 

The SAGE expression data could be integrated with existing positional information 
to generate chromosomal expression maps (Figure 4). These maps were generated 
using the sequence of the yeast genome and the position coordinates of ORFs obtained 
from the Stanford Yeast Genome Database. Although there were a few genes that 
were noted to be physically proximal and have similarly high levels of expression, 
there did not appear to be any clusters of particularly high or low expression on any 
chromosome. Genes like histories H3 and H4, which are known to have coregulated 
divergent promoters and are immediately adjacent on chromosome 14 (Smith and 
Murray, 1983), had very similar expression levels (5 and 6 copies per cell, 
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respectively). The distribution of transcripts among the chromosomes suggested that 
overall transcription was evenly dispersed, with total transcript levels being roughly 
linearly related to chromosome size (r 2 =0.85, data not shown). However, regions 
within 10 kb of telomeres appeared to be uniformly undertranscribed, containing on 
average 3.2 tags per gene as compared with 12.4 tags per gene for non-telomeric 
regions (Figure 4). This is consistent with the previously described observations of 
"telomeric silencing" in yeast (Gottschling et aL, 1990), Recent studies have reported 
telomeric position effects as far as 4 kb from telomere ends (Renauld et aL, 1993). 

Gene Expression Patterns 

Table 1 lists the 30 most highly expressed genes, all of which are expressed at 
greater than 60 mRNA copies per cell. As expected, these genes mostly correspond 
to well characterized enzymes involved in energy metabolism and protein synthesis 
and were expressed at similar levels in all three growth states (Examples in Figure 5). 
Some of these genes, including EM32 (McAlister and Holland, \9%2),PDC1 (Schmitt 
et aL, 1983), PGK1 (Chambers et aL, 1989), PYK1 (Nishizawa et aL, 1989), and 
ADH1 (Denis et a/., 1983), are known to be dramatically induced in the glucose-rich 
growth conditions used in this study. In contrast, glucose repressible genes such as 
the GAL1/GAL7/GAL1 0 cluster (St John and Davis, 1979), and GAL3 (Bajwa et aL, 
1988) were observed to be expressed at very low levels (0.3 or fewer copies per cell). 

As expected for the yeast strain used in this study, mating type a specific genes, 
such as the a factor genes (MFA1, MFA2) (Michaelis and Herskowitz, 1988), and 
alpha factor receptor (STE2) (Burkholder and Hartwell, 1985) were all observed to be 
expressed at significant levels (range 2 to 10 copies per cell), while mating type alpha 
specific genes (MFal, MFa2, STE3) (Hagen et aL, 1986; Kurjan and Herskowitz, 
1982; Singh et aL, 1983) were observed to be expressed at very low levels (<0.3 
copies/cell). 

Three of the highly expressed genes in Table I had not been previously 
characterized. One contained an ORF with predicted ribosomal function, previously 
identified only by genomic sequence analysis. Analyses of all SAGE data suggested 
that there were 2,684 such genes corresponding to uncharacterized ORFs which were 
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transcribed at detectable levels. The 30 most abundant of these transcripts were 
observed more than 30 times, corresponding to at least 8 transcripts per cell (Table 2). 
The other two highly expressed uncharacterized genes corresponded to ORFs not 
predicted by analysis of the yeast genome sequence (NORF = Konannotated ORFV 
Analyses of SAGE data suggested that there were at least 160 NORF genes transcribed 
at detectable levels. The 30 most abundant of these transcripts were observed at least 
9 times (Table 3 and examples in Figure 5). 

Interestingly, one of the NORF genes (NORF5) was only expressed in S phase 
arrested cells and corresponded to the transcript whose abundance varied the most in 
the three states analyzed (> 49 fold, Figure 5). Comparison of S phase arrested cells 
to the other states also identified greater than 9 fold elevation of the RNR2 and RNR4 
transcripts (Figure 5). Induction of these ribonucleoside reductase genes is likely to 
be due to the hydroxyurea treatment used to arrest cells in S phase (Elledge and Davis, 
1 989). Likewise, comparison of G2/M arrested cells identified elevation of RBL2 
and dynein light chain, both microtubule associated proteins (Archer et al. 9 1 995; Dick 
et a/., 1996). As with the RNR inductions, these elevated levels seem likely to be 
related to the nocodazole treatment used to arrest cells in the G2/M phase. While 
there were many relatively small differences between the states (for example, NORF1, 
Figure 5), overall comparison of the three states revealed surprisingly few dramatic 
differences; there were only 29 transcripts whose abundance varied more than 1 0 fold 
among the three different states analyzed (Tables 4 and 5). 

A comprehensive analysis for NORF genes was performed using the SAGE 
data. Yeast genome intergenic regions were defined as regions outside annotated 
ORFs or the 500bp region downstream of annotated ORFs (yeast genome sequence 
and tables of annotated ORFs were obtained from SGD at 
http://genome-www.stanford.edu/Saccharomyces/). Based on sequence analysis a 
total of 9524 putative ORFs of 25-99 amino acids were present in the intergenic 
regions; 510 of these ORFs contain or are adjacent to observed SAGE tags (Table 6). 
Of the 60,633 SAGE tags analyzed, there were 302 unique SAGE tags either within 
or adjacent to intergenic ORFs (lOObp upstream or 500bp downstream of the ORF) 
(Table 6). Note that in some cases, more than one NORF contains or is adjacent to the 
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SAGE tag. These tags matched the genome uniquely, were in the correct orientation, 
and were expressed at levels greater than 0.3 transcript copies per cell. 

The expression level for each NORF shown in Table 6 corresponds to the 
number of mRNA transcript copies per cell. If the expression level is positive it 
means that the tag is on the + strand of the chromosome; if negative, the tag is on the 
- strand of the chromosome. 

Discussion 

Analysis of a yeast transcriptome affords a unique view of the RNA components 
defining cellular life. Comparison of gene expression patterns from altered 
physiologic states can provide insight into genes that are important in a variety of 
processes. Comparison of transcriptomes from a variety of physiologic states should 
provide a minimum set of genes whose expression is required for normal vegetative 
growth, and another set composed of genes that will be expressed only in response to 
specific environmental stimuli, or during specialized processes. For example, recent 
work has defined a minimal set of 250 genes required for prokaiyotic cellular life 
(Mushegian and Koonin, 1996), Examination of the yeast genome readily identified 
homologous genes for 196 of these, over 90% of which were observed to be expressed 
in the SAGE analysis. Detailed analyses of yeast transcriptomes, as well as 
transcriptomes from other organisms, should ultimately allow the generation of a 
minimal set of genes required for eukaryotic life. 

Like other genome-wide analyses, SAGE analysis of yeast transcriptomes has 
several potential limitations. First, a small number of transcripts would be expected 
to lack an Nlalll site and therefore would not be detected by our analysis. Second, our 
analysis was limited to transcripts found at least as frequently as 0.3 copies per cell. 
Transcripts expressed in only a minute fraction of the cell cycle, or transcripts 
expressed in only a fraction of the cell population, would not be reliably detected by 
our analysis. Finally, mRNA sequence data are practically unavailable for yeast, and 
consequendy, some SAGE tags cannot be unambiguously matched to corresponding 
genes. Tags which were derived from overlapping genes, or genes which have 
unusually long 3' untranslated regions may be misassigned. Increased availability of 
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3' UTR sequences in yeast mRNA molecules should help to resolve the ambiguities. 

Despite these potential limitations, it is clear that the analyses described here 
furnish both global and local pictures of gene expression, precisely defined at the 
nucleotide level. These data, like the sequence of the yeast genome itself, provide 
simple, basic information integral to the interpretation of many experiments in the 
future. The availability of mRNA sequence information from EST sequencing as 
well as various genome projects, will soon allow definition of transcriptomes from a 
variety of organisms, including human. The data recorded here suggest that a 
reasonably complete picture of a human cell transcriptome will require only about 10 * 
20 fold more tags than evaluated here, a number well within the practical realm 
achievable with a small number of automated sequencers. The analysis of global 
expression patterns in higher eukaryotes is expected, in general, to be similar to those 
reported here for S. cerevisiae. However, the analysis of the transcriptome in different 
cells and from different individuals should yield a wealth of information regarding 
gene function in normal, developmental, and disease states. 

Experimental Procedures 
Yeast cell culture 

The source of transcripts for all experiments was S. cerevisiae strain YPH499 
(MATa ura3-52 tys2-801 adel-lOl Ieu2-Al his3-A2Q0 trpl-A63) (Sikorski and 
Hieter, 1989). Logarithmically growing cells were obtained by growing yeast cells to 
early log phase (3 x 10 6 cells/ml) in YPD (Rose et al. f 1990) rich medium (YPD 
supplemented with 6 mM uracil, 4.8 mM adenine and 24 mM tryptophan) at 30°C. 
For arrest in the Gl/S phase of the cell cycle, hydroxyurea (0. 1 M) was added to early 
log phase cells, and the culture was incubated an additional 3.5 hours at 30 °C. For 
arrest in the G2/M phase of the cell cycle, nocodazole (1 5 ng/ml) was added to early 
log phase cells and the culture was incubated for an additional 100 minutes at 30°C. 
Harvested cells were washed once with water prior to freezing at -70°C. The growth 
states of the harvested cells were confirmed by microscopic and flow cytometric 
analyses (Basrai et al. 9 1996). 
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SAGE protocol 

The SAGE method was performed as previously described (Velculescu et al , 
1995; Kinder et aL 9 U.S. Patents 5,866,330 and 5,695,937), with exceptions noted 
below. PolyA RNA was converted to double-stranded cDNA with a BRL synthesis 
kit using the manufacturer's protocol except for the inclusion of primer biotin-5'-T, r 
3\ The cDNA was cleaved with Nlalll (Anchoring .Enzyme). As Nlalll sites were 
observed to occur once every 309 base pairs in three arbitrarily chosen yeast 
chromosomes (1, 5, 10), 95% of yeast transcripts were predicted to be detectable with 
a Nlalll-based SAGE approach. After capture of the 3' cDNA fragments on 
streptavidin coated magnetic beads (Dynal), the bound cDNA was divided into two 
pools, and one of the following linkers containing recognition sites for BsmFI was 
ligated to each pool: Linker I, 5*. 

TTTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGACATG-3' (SEQ 
ID NO: 1 )-5-TCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC 
[amino mod. C7]-3'(SEQ ID NO:2).; Linker 2,5'- 
TTTCTGGTCGAATTCAAGCTTCTAACGATGTACGGGGACATG-3 , (SEQ ID 
NO:3) , S-TCCCCGTACATCGTTAGAAGCTTGAATTCGAGCAGfamino mod. 
C7]-3'(SEQIDNO:4). 

As BsmFI (Tagging Enzyme) cleaves 14 bp away from its recognition site, and 
the Nlalll site overlaps the BsmFI site by 1 bp, a 1 5 bp SAGE tag was released with 
BsmFI. SAGE tag overhangs were filled-in with Klenow, and tags from the two pools 
were combined and ligated to each other. The ligation product was diluted and then 
amplified with PCR for 28 cycles with 5 f -GGATTTGCTGGTGCAGTACA-3 , (SEQ 
ID NO:5) and 5 , <:TGCTCGAATTCAAGCTTCT-3' (SEQ ID NO:6), as primers. The 
PCR product was analyzed by polyacrylamide gel electrophoresis (PAGE), and the 
PCR product containing two tags ligated tail to tail (ditag) was excised. The PCR 
product was then cleaved with Nlalll, and the band containing the ditags was excised 
and self-ligated. After ligation, the concatenated products were separated by PAGE 
and products between 500 bp and 2 kb were excised. These products were cloned into 
the SphI site of pZero (Invitrogen). Colonies were screened for inserts by PCR with 
Ml 3 forward and M 13 reverse sequences located outside the cloning site as primers. 
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PCR products from selected clones were sequenced with the TaqFS DyePrimer 
kits (Perkin Elmer) and analyzed using a 377 ABI automated sequencer (Perkin 
Elmer), following the manufacturer's protocol. Each successful sequencing reaction 
identified an average of 26 tags; given a 90% sequencing reaction success rate, this 
corresponded to ah average of about 850 tags per sequencing gel. 

SAGE data analysis 

Sequence files were analyzed by means of the SAGE program group 
(Velculescu etai 9 1995), which identifies the anchoring enzyme site with the proper 
spacing and extracts the two intervening tags and records them in a database. The 
68,691 tags obtained contained 62,965 tags from unique ditags and 5,726 tags from 
repeated ditags. The latter were counted only once to eliminate potential PCR bias of 
the quantitation, as described (Velculescu et aL 9 1995). Of 62,965 tags, 2,332 tags 
corresponded to linker sequences, and were excluded from further analysis. Of the 
remaining tags, 4,342 tags could not be assigned, and were likely due to sequencing 
errors (in the tags or in the yeast genomic sequence). If all of these were due to tag 
sequencing errors, this corresponds to a sequencing error rate of about 0.7% per base 
pair (for a lObp tag), not far from what we would have expected under our automated 
sequencing conditions. However, some unassigned tags had a much higher than 
expected frequency of A's as the last five base pairs of the tag (5 of the 52 most 
abundant unassigned tags), suggesting that these tags were derived from transcripts 
containing anchoring enzyme sites within several base pairs from their polyA tails. 
Given the frequency of Nlalll sites in the genome (one in 309 base pairs), 
approximately 3% of transcripts were predicted to contain Nlalll sites within 10 bp of 
their poly A tails. 

As very sparse data are available for yeast mRNA sequences and efforts to date 
have not been able to identify a highly conserved polyadenylation signal (Imiger and 
Braus, 1994; Zaret and Sherman, 1982), we used 14 bp of SAGE tags (i.e. the Nlalll 
site plus the adjacent 10 bp) to search the yeast genome directly (yeast genome 
sequence obtained from the Stanford yeast genome ftp site (genome-ftp.stanford.edu) 
on August 7, 1996). Because only coding regions are annotated in the yeast genome, 
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and SAGE tags can be derived from 3' untranslated regions of genes, a SAGE tag was 
considered to correspond to a particular gene if it matched the ORF or the region 500 
bp 3* of the ORF (locus names, gene names and ORF chromosomal coordinates were 
obtained from Stanford yeast genome ftp site, and ORF descriptions were obtained 
from MIPS www site (http://www.mips.biochem.mpg.de/) on August 14, 1996). 
ORFs were considered genes with known functions if they were associated with a 
three letter gene name, while ORFs without such designations were considered 
uncharacterized. 

As expected, SAGE tags matched transcribed portions of the genome in a highly 
non-random fashion, with 88% matching ORFs or their adjacent 3' regions in the 
correct orientation (chi-squared P value <10°°). In instances when more than one tag 
matched a particular ORF in the correct orientation, the abundance was calculated to 
be the sum of the matched tags. Tags that matched ORFs in the incorrect orientation 
were not used in abundance calculations. In instances when a tag matched more than ' 
one region of the genome (for example an ORF and npn-ORF region) only the 
matched ORF was considered. In some cases the 15th base of the tag could also be 
used to resolve ambiguities. 

For the identification of NORF genes, only tags were considered that matched 
portions of the genome that were further than 500 bp 3' of a previously identified ORF 
and were observed at least two times in the SAGE libraries. 
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Additional NORFs 



SAGE Tag 


Seq.lDN . 


GGCGCAATTT 


97 


TAAGTGATGA 


08 


TTGTTGAATT 


99 


GAAGCAGTAA 


100 


ACATATGTTA 


101 


CCCTACACGG 


102 


GTAATTGGAC 


103 


ATCAGACAAA 


104 


TTATGAAAGA 


105 


ATTCGTTCTA 


106 


AGCAGGAGTT 


107 


TTCTATTAGG 


108 


TGGATTTCAG 


109 


CAGATATAAT 


110 


CTGTTTTGGG 


111 


CA1 1 TTTAGT 


112 


TTGAAAAGAT 


113 


TAAGCCCATC 


114 


AGCGTCCTCA 


115 


TTTAGTTAAT 


116 


ATGGTAGCCA 


117 


AATTAGACTA 


118 


AGTGACTCTT 


119 


GGACTATAAG 


120 


ACTTTTTCAG 


121 


GTCATATAGT 


122 


CAAGAAAGTG 


123 


GTGGGAAAGG 


124 


TACTTTATAT 


125 


AATACCAGCG 


126 


GCCTTGTATA 


127 


GGTACATTCA 


128 


GATTTCTCTG 


129 


TAGTTGCTCC 


130 


GTAAGAAATC 


131 


CTTGGGCTAT 


132 


AAATGGTGAT 


133 


ATCATTTGGG 


134 


CTGAACTTTA 


135 


CCAGAAGGAG 


136 


CCGGTTACTA 


137 


CGATGAGAAG 


138 


AAACCGTCCC 


139 


TCATTCATAC 


140 


TATCI 1 1 1 IG 


141 


TTAGAATAAT 


142 


GTACGCTGTG 


143 


TATATTAATT 


144 



Chr 


TagP 


Copies/cell 


4- 


1108395 


2 


7 


593382 


2 


10 


608373 


2 


3 


155607 


2 


4 


916112 


2 


6 


223289 


2 


10 


392099 


2 


14 


687272 


2 


15 


81263 


2 


15 


841970 


2 


16 


188350 


2 


2 


418749 


2 


4 


1224930 


2 


5 


52488 


2 


11 


374761 


2 


11 


508212 


2 


13 


104160 


2 


13 


251273 


2 


15 


832420 


2 


2 


477623 


2 


3 


56961 


2 


3 


162589 


2 


4 


1490879 


2 


5 


251266 


2 


10 


159213 


2 


13 


158765 


2 


13 


171166 


2 


13 


804600 


2 


16 


366449 


2 


3 


175540 




4 


372624 


1 


5 


67152 




5 


187462 


1 


7 


317108 


1 


7 


836202 


1 


8 


107992 


1 


11 


558686 


1 


12 


199358 


1 


12 


283720 


1 


13 


652873 




15 


803663 




15 


1004369 




16 


199141 




2 


164728 




4 


169784 




4 


603508 




5 


118089 




6 


64228 
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GTTCTTGCCT 


145 


ATATAGCTGC 


146 


CCAAAAAAAA 


147 


GAACTCCACA 


148 


CCTTCACTGC 


149 


CACATCATAA 


150 


GAAGTATTGA 


151 


TGCGCGTATA - 


152 


GGGTAGTACT 


153 


TAGTTTTGTC 


154 


CAATTCCTAC 


155 


TTTGATTTGA 


156 


GGCTCTGGTT 


157 


CAGAAATAGC 


158 


CTGTTATTTT 


159 


CGAAGTCAAA 


160 


CTCTAGATAA 


161 


AGTCAAAATG 


162 


GCGAGTTTAG 


163 


GCTCCAATAG 


164 


TTTATTTGAG 


165 


GTTATATTGA 


166 


TGGGTTGAAG 


167 


ATTTTATTTG 


168 


ATCATAAAAA 


169 


TTATATAAAA 


170 


CTACTTCTGC 


171 


ATAAGACAGT 


172 


TTCATAAGTT 


173 


TAAATCTGAG 


174 


CTGGTAGAAA 


175 


CACGTACACA 


176 


CCAAGATCAA 


177 


AGCTTGTTCC 


178 


CACATTCGTT 


179 


CTTACATATA 


180 


TCTATAGCAA 


181 


CCTTTCTGAA 


182 


CCTTTAGAAT 


183 


AATTAACACC 


184 


GCGCAGGGGC 


185 


TGTTTATAAA 


186 


AAAAGTCATT 


187 


TTCGTAAACT 


188 


1 1 1 1 IGGAGT 


189 


AGGCATCTTG 


190 


AAATCAAAAC 


191 


AATTGACGAA 


192 


TTGATGATTT 


193 


CCTGIFITTG 


194 


1 1 1 1 IAAAAA 


195 



7 


939579 


1 


10 


181144 


1 


11 


91785 


1 


11 


94125 


1 


11 


374172 


1 


11 


625896 


1 


12 


603999 


1 


13 


206410 


1 


13 


671730 


1 


15 


33475 


1 


1 


172182 


0.8 


2 


46431 


0.8 


2 


414510 


0.8 


2 


585130 


0.8 


2 


616054 


0.8 


2 


680605 


0.8 


3 


171584 


0.8 


4 


192750 


0.8 


4 


691301 


0.8 


4 


1131020 


0.8 


4 


1237501 


0.8 


4 


1401803 


0.8 


5 


251266 


0.8 


5 


447729 


0.8 


5 


548612 


0.8 


6 


223182 


0.8 


8 


34653 


0.8 


10 


227802 


0.8 


10 


471894 


0.8 


11 


145617 


0.8 


11 


151174 


0.8 


11 


403208 


0.8 


11 


425882 


0.8 


12 


234966 


0.8 


12 


759853 


0.8 


12 


789781 


0.6 


13 


228936 


0.8 


13 


297985 


0.8 


13 


777999 


0.8 


13 


842122 


0.8 


14 


440984 


0.8 


14 


661710 


0.8 


15 


32081 


0.8 


15 


680625 


0.8. 


15 


888343 


0.8 


16 


250284 


0.8 


16 


453890 


0.8 


16 


560169 


0.8 


16 


582360 


0.8 


16 


643476 


0.8 


1 


101436 


0.5 
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AAGTTTGATC 


196 


AGCACCTATG 


197 


TGATTTATCC 


19a 


ACTGCATCTG 


199 


CAAGTTAGGA 


200 


ATACCCAATT 


201 


AACTTTGTAT 


202 


GCGGCGGGTG 


203 


AAAATTGTTC 


204 


TCAAGTACTC 


205 


AACTGTATGC 


206 


CTATCGGCCA 


207 


ACAAGCCCAA 


208 


GTACAGGGCT 


209 


AAGATCATCG 


210 


GAACTCCTGG 


211 


GAACGAGAAG 


212 


1 1 1 1 IAATAC 


213 


TCTCCAGTTG 


214 


AATACGTTAC 


215 


ACGATTGGCT 


216 


TGTTTATAAG 


217 


CGTTTTCGTC 


218 


TCGAACCTCT 


219 


TCCACACACA 


220 


CCGTGCGTGC 


221 


TTTCTTCAAC 


222 


CCAAGTCTCG 


223 


AGAGCGAATT 


224 


TGTAGATTAT 


225 


AAAAGTAGTT 


226 


ACTTGGTATG 


227 


TTAATGTTAT 


228 


TACACGCGCG 


229 


GGTCACTCCT 


230 


AAGTGATGAA 


231 


TTTATCTTGT 


232 


AGTGATTGTT 


233 


GCTTTGTTGT 


234 


TCATTGATTC 


235 


TTCACCGGAA 


236 


ACTATTCTGT 


237 


GGGCCAACCC 


238 


AAAATATCTT 


239 


TAGTAGTAAC 


240 


AAGCGCACAA 


241 


TCGCTGTTTT 


242 


TGTAI MUG 


243 


CTAAACAAAG 


244 


TAGGAAGAAA 


245 


GGAAAAATTA 


246 



1 


199848 


0.5 


2 


46913 


0.5 


2 


418946 


0.5 


2 


680860 


0.5 


2 


744770 


0.5 


3 


29939 


0.5 


3 


30056 


0.5 


3 


41645 


0.5 


3 


57108 


0.5 


3 


157855 . 


. 0.5 


3 


223882 


0.5 


3 


278840 


0.5 


3 


289917 


0.5 


4 


93873 


0.5 


4 


254851 


0.5 


4 


340891 


0.5 


4 


371850 


0.5 


4 


372058 


0.5 


4 


381712 


0.5 


4 


471791 


0.5 


4 


509158 


0.5 


4 


521709 


0.5 


4 


538839 


0.5 


4 


578702 


0.5 


4 


930972 


0.5 


4 


1324367 


0.5 


5 


116099 


0.5 


5 


159320 


0.5 


5 


207517 


0.5 


5 


280465 


0.5 


5 


286387 


0.5 


5 


422942 


0.5 


5 


544523 


0.5 


5 


544555 


0.5 


6 


62983 


0.5 


6 


76141 


0.5 


6 


130327 


0.5 


6 


256223 


0.5 


7 


72577 


0.5 


7 


110590 


0.5 


7 


323655 


0.5 


7 


423957 


0.5 


7 


433787 


0.5 


7 


559397 


0.5 


7 


622201 


0.5 


7 


735909 


0.5 


7 


800300 


0.5 


7 


836202 


0.5 


7 


836587 


0.5 


7 


905046 


0.5. 


7 


958839 


0.5 
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TTTGGATAGT 


247 


CGTTTGTGTA 


248 


AGAAAAAAAC 


249 


TAAAGTCCAG 


250 


TAAGCAGATT 


251 


ATGAGCATTT 


252 


AGGTGCAAAA 


253 


TAACAAAGAG 
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CLAIMS 

1 . An isolated DNA molecule comprising a coding sequence of a yeast 
gene selected from the group of NORF genes comprising a SAGE tag as shown in 
SEQIDNOS:67-811. 

2. The isolated DNA molecule of claim 1 which is involved in cell cycle 
progression. 

3. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by at least 1 0% between any two phases of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 

4. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by at least 25% between any two phases of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 

5. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by at least 50% between any two phases of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 

6. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by at least 100% between any two phases of the cell cycle selected from 
the group consisting of: log phase, S phase, and G2/M. 

7. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by a statistically significant difference (greater than 95% confidence level) 
between any two phases of the cell cycle selected from the group consisting of: log 
phase, S phase, and G2/M 

8. The isolated DNA molecule of claim 7 wherein the NORF gene is 
selected from the group consisting of NORF N fl 1, 2, 4, 5, 6, 17, 25, and 27. 

9. The isolated DNA molecule of claim 2 wherein the NORF gene is not 
expressed in at least one phase of the cell cycle selected from the group consisting of: 
log phase, S phase, and G2/M. 

10. The isolated DNA molecule of claim 1 which is genomic. 

1 1 . The isolated DNA molecule of claim 1 which is cDNA. 

12. A method of using NORF genes to affect the cell cycle, comprising the 
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step of: 

administering to a cell an isolated DNA molecule comprising a coding 
sequence of aNORF gene whose expression varies by at least 10% between any two 
phases of the cell cycle selected from the group consisting of log phase, S phase, and 
G2/M. 

13. The method of claim 12 wherein the cell is a yeast cell. 

14. The method of claim 12 wherein the cell is a fungal cell. 

15. The method of claim 1 2 wherein the cell is a mammalian cell. 

16. The method of claim 12 wherein the NORF gene is selected from the 
group consisting of NORF N fl 1, 2, 4, 5, 6, 17, 25, and 27. 

1 7. A method for screening candidate antifungal drugs, comprising the steps 

of: 

contacting a test substance with a yeast cell; 

monitoring expression of a NORF gene whose expression varies by at 
least 10% between any two phases of the cell cycle selected from the group consisting 
of log phase, S phase, and G2/M, wherein a test substance which modifies the 
expression of the yeast gene is a candidate antifungal drug. 

18. The method of claim 17 wherein the NORF gene is selected from the 
group consisting of NORF N a 1, 2, 4, 5, 6, 17, 25, and 27. 

19. A method for identifying human genes which are involved in cell cycle 
progression, comprising the steps of: 

contacting human DNA with a probe which comprises at least 10 
contiguous nucleotides of a NORF gene whose expression varies by at least 10% 
between any two phases of the cell cycle selected from the group consisting of log 
phase, S phase, and G2/M phase, wherein a human DNA sequence which hybridizes 
to the probe is identified as a sequence of a candidate human gene which is involved 
in cell cycle progression. 

20. The method of claim 19 wherein the NORF gene is selected from the 
group consisting of NORF N Q 1, 2, 4, 5, 6, 17, 25, and 27. 

21. A probe comprising at least 14 contiguous nucleotides of a NORF gene 
comprising a SAGE tag as shown in SEQ ID NOS:67-81 1. 
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22. The probe of claim 21 wherein expression of the NORF gene varies by 
at least 10% between any two phases of a cell cycle selected from the group consisting 
of: log phase, S phase, and G2/M. 

23. The probe of claim 22 wherein expression of the NORF gene varies by 
at least 25% between any two phases of the cell cycle selected from the group 
consisting of: log phase, S phase, and G2/M. 

24. The probe of claim 22 wherein expression of the NORF gene varies by 
at least 50% between any two phases of the cell cycle selected from the group 
consisting of: log phase, S phase, and G2/M. 

25. The probe of claim 22 wherein expression of the NORF gene varies by 
at least 100% between any two phases of the cell cycle selected from the group 
consisting of: log phase, S phase, and G2/M. 

26. The probe of claim 22 wherein the NORF gene is not expressed in at 
least one phase of the cell cycle selected from the group consisting of: log phase, S 
phase, and G2/M. 

27. The probe of claim 22 wherein expression of the NORF gene varies by 
a statistically significant difference (greater than 95% confidence level) between any 
two phases of the cell cycle selected from the group consisting of: log phase, S phase, 
and G2/M. 

28. The probe of claim 22 wherein the gene is selected from the group 
consisting of NORF N fl 1, 2, 4, 5, 6, 17, 25, and 27. 

29. The method of claim 17 wherein said step of monitoring expression is 
performed using nucleic acid molecules which are immobilized on a solid support. 

30. The method of claim 29 wherein the nucleic acid molecules are in on 

array. 

3 1 . The method of claim 19 wherein a probe which comprises a portion of 
the NORF gene is in an array on a solid support. 

32. An array of probes on a solid support wherein at least one probe 
comprises at least 14 contiguous nucleotides of a NORF gene comprising a SAGE tag 
as shown in SEQ ID NOS:67-81 1. 

33. The array of claim 32 wherein the at least one NORF gene is involved 
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in cell cycle progression. 

34. The array of claim 32 wherein the NORF gene is selected from the group 
consisting of NORF No. 1 , 2, 4, 5, 6, 1 7, 25 t and 27. 

35. The array of claim 32 which comprises at least 100 probes of distinct 
5 sequence. 

36. The array of claim 32 which comprises at least 500 probes of distinct 
sequence. 

37. The array of claim 32 which comprises at least 1 ,000 probes of distinct 
sequence. 

10 38. A method of identifying a candidate drug as a member of a class of 

drugs having a characteristic effect on gene expression in a yeast cell, comprising the 
steps of: 

contacting a yeast cell with a candidate drug; and 
monitoring expression in the yeast cell of at least one NORF gene 
1 5 whose expression is affected by the class of drugs, wherein detection of a difference 

in expression of the at least one NORF gene in the yeast cell relative to expression in 
the absence of the candidate drug identifies the candidate drug as a member of the 
class of drugs. 

39. The method of claim 38 wherein the step of monitoring expression is 
20 performed using nucleic acid molecules which are immobilized on a solid support. 

40. The method of claim 39 wherein the nucleic acid molecules are in an 

array. 

4 1 . The method of claim 38 wherein expression of two or more NORF genes 
is monitored. 

25 42. The probe of claim 2 1 which is immobilized on a solid support. 
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